Search CORE

99 research outputs found

Acoustic Modelling for Under-Resourced Languages

Author: Stüker Sebastian
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2009
Field of study

Automatic speech recognition systems have so far been developed only for very few languages out of the 4,000-7,000 existing ones. In this thesis we examine methods to rapidly create acoustic models in new, possibly under-resourced languages, in a time and cost effective manner. For this we examine the use of multilingual models, the application of articulatory features across languages, and the automatic discovery of word-like units in unwritten languages

KITopen

Multilingual Adaptation of RNN Based ASR Systems

Author: Müller Markus
Stüker Sebastian
Waibel Alex
Publication venue
Publication date: 27/02/2018
Field of study

In this work, we focus on multilingual systems based on recurrent neural networks (RNNs), trained using the Connectionist Temporal Classification (CTC) loss function. Using a multilingual set of acoustic units poses difficulties. To address this issue, we proposed Language Feature Vectors (LFVs) to train language adaptive multilingual systems. Language adaptation, in contrast to speaker adaptation, needs to be applied not only on the feature level, but also to deeper layers of the network. In this work, we therefore extended our previous approach by introducing a novel technique which we call "modulation". Based on this method, we modulated the hidden layers of RNNs using LFVs. We evaluated this approach in both full and low resource conditions, as well as for grapheme and phone based systems. Lower error rates throughout the different conditions could be achieved by the use of the modulation.Comment: 5 pages, 1 figure, to appear in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2018

arXiv.org e-Print Archive

Crossref

Multilingual Articulatory Features

Author: Stüker Sebastian
Publication venue
Publication date: 04/08/2008
Field of study

KITopen

Automatic Generation of Pronunciation Dictionaries - for New, Unseen Languages by Voting Phoneme Recognizers in Nine Different Languages (Studienarbeit)

Author: Stüker Sebastian
Publication venue
Publication date: 04/08/2008
Field of study

KITopen

Multi-stage Large Language Model Correction for Speech Recognition

Author: Nguyen Thai-Son
Pu Jie
Stüker Sebastian
Publication venue
Publication date: 17/10/2023
Field of study

In this paper, we investigate the usage of large language models (LLMs) to improve the performance of competitive speech recognition systems. Different from traditional language models that focus on one single data domain, the rise of LLMs brings us the opportunity to push the limit of state-of-the-art ASR performance, and at the same time to achieve higher robustness and generalize effectively across multiple domains. Motivated by this, we propose a novel multi-stage approach to combine traditional language model re-scoring and LLM prompting. Specifically, the proposed method has two stages: the first stage uses a language model to re-score an N-best list of ASR hypotheses and run a confidence check; The second stage uses prompts to a LLM to perform ASR error correction on less confident results from the first stage. Our experimental results demonstrate the effectiveness of the proposed method by showing a 10% ~ 20% relative improvement in WER over a competitive ASR system -- across multiple test domains.Comment: Submitted to ICASSP 202

arXiv.org e-Print Archive

A Cornucopia of Iridium Nitrogen Compounds Produced from Laser‐Ablated Iridium Atoms and Dinitrogen

Author: Beckers Helmut
Riedel Sebastian
Stüker Tony
Publication venue
Publication date: 01/01/2020
Field of study

The reaction of laser‐ablated iridium atoms with dinitrogen molecules and nitrogen atoms yield several neutral and ionic iridium dinitrogen complexes such as Ir(N2), Ir(N2)+, Ir(N2)2, Ir(N2)2−, IrNNIr, as well as the nitrido complexes IrN, Ir(N)2 and IrIrN. These reaction products were deposited in solid neon, argon and nitrogen matrices and characterized by their infrared spectra. Assignments of vibrational bands are supported by ab initio and first principle calculations as well as 14/15N isotope substitution experiments. The structural and electronic properties of the new dinitrogen and nitrido iridium complexes are discussed. While the formation of the elusive dinitrido complex Ir(N)2 was observed in a subsequent reaction of IrN with N atoms within the cryogenic solid matrices, the threefold coordinated iridium trinitride Ir(N)3 could not be observed so far

Institutional Repository of the Freie Universität Berlin

Towards Improving Low-Resource Speech Recognition Using Articulatory and Language Features

Author: Müller Markus
Stüker Sebastian
Waibel Alexander
Publication venue: Association for Computational Linguistics
Publication date: 03/01/2024
Field of study

In an increasingly globalized world, there is a rising demand for speech recognition systems. Systems for languages like English, German or French do achieve a decent performance, but there exists a long tail of languages for which such systems do not yet exist. State-of-the-art speech recognition systems feature Deep Neural Networks (DNNs). Being a data driven method and therefore highly dependent on sufficient training data, the lack of resources directly affects the recognition performance. There exist multiple techniques to deal with such resource constraint conditions, one approach is the use of additional data from other languages. In the past, is was demonstrated that multilingually trained systems benefit from adding language feature vectors (LFVs) to the input features, similar to i-Vectors. In this work, we extend this approach by the addition of articulatory features (AFs). We show that AFs also benefit from LFVs and that multilingual system setups benefit from adding both AFs and LFVs. Pretending English to be a low-resource language, we restricted ourselves to use only 10h of English acoustic training data. For system training, we use additional data from French, German and Turkish. By using a combination of AFs and LFVs, we were able to decrease the WER from 18.1% to 17.3% after system combination in our setup using a multilingual phone set

KITopen